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We formulate and optimize a computational search strategy for detecting gravitational waves from 
isolated, previously-unknown neutron stars (that is, neutron stars with unknown sky positions, spin 
frequencies, and spin-down parameters). It is well known that fully coherent searches over the 
relevant parameter-space volumes are not computationally feasible, and so more computationally 
efficient methods are called for. The first step in this direction was taken by Brady&Creighton 
(2000), who proposed and optimized a two-stage, stack-slide search algorithm. We generalize and 
otherwise improve upon the Brady-Creighton scheme in several ways. Like Brady&Creighton, we 
consider a stack-slide scheme, but here with an arbitrary number of semi-coherent stages and with 
a coherent follow-up stage at the end. We find that searches with three semi-coherent stages are 
significantly more efficient than two-stage searches (requiring about 2-5 times less computational 
power for the same sensitivity) and are only slightly less efficient than searches with four or more 
stages. We calculate the signal-to-noise ratio required for detection, as a function of computing 
power and neutron star spin-down-age, using our optimized searches. 

PACS numbers: 04.80.Nn, 95.75.Pq, 97.60.Gb 
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I. INTRODUCTION 

In analyzing data from Earth-based and space-based 
gravitational-wave (GW) detectors, we will be computa- 
tionally limited in performing certain types of searches- 
especially searches for long-lived signals described by sev- 
eral unknown parameters. For such signals, the number 
of templates signals required to discretely cover the pa- 
rameter space (at useful resolution) typically increases 
rapidly as a function of the observation time. For ground- 
based detectors, such as LIGO, a well-known example is 
the search for nearly periodic GWs from unknown, iso- 
lated, rapidly rotating neutron stars (NSs). We will refer 
to NSs that are continuously emitting GWs as "GW pul- 
sars". By "unknown", we mean that the GW pulsar's 
position on the sky, frequency, and frequency derivatives 
are all unknown, and so must be searched over. (The 
NS could be unknown either because it is elect romagnet- 
ically inactive, or because its electromagnetic emission 
does not reach us-e.g., because we do not intersect its 
radio pulsar beam.) Brady et al. [1] showed that straight- 
forward matched-filter searches for unknown GW pulsars 
would be severely computationally limited; for example, 
searches for young, fast NSs (NSs with GW frequencies as 
high as 1 kHz and spin-down ages as short as 40 yr) would 
be limited to observation times of order one day. To ad- 
dress this problem, Brady & Creighton [2] (henceforth 
referred to as BC) were the first to consider hierarchi- 
cal, multistage, semi-coherent searches for GW pulsars. 
Briefly, a semi-coherent search is one where a sequence 
of short data stretches are all coherently searched, using 
some technique akin to matched filtering, and then the 
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resulting powers from the different stretches are summed. 
The method is only "semi-coherent" because powers are 
added instead of complex amplitudes; i.e., information 
regarding the overall phase of the signal in different 
stretches is discarded. This allows one to use a much 
coarser grid on parameter space than would be required 
in a fully coherent search of the same data. BC de- 
veloped a "stack-slide" method for summing the pow- 
ers along different tracks in the time-frequency plane, 
in close analogy to the "power stacking" method (some- 
times called the Radon transform) used in radio pulsar 
searches. The basic idea of their two-stage search is 
to identify a list of "candidates" (basically, promising- 
looking regions in parameter space) in the first stage, 
using some fraction of the available data, and then to 
"follow up" those candidates using more data in the sec- 
ond stage. In their scheme, both the first and second 
stages are semi- coherent. 

In this paper we revisit the problem of constructing 
efficient, hierarchical searches for GW pulsars. We build 
on the BC treatment, but we also significantly general- 
ize and otherwise improve upon their work. The most 
important improvements are that we consider searches 
with n semi-coherent stages (not just 2), with surviving 
candidates being winnowed at each stage, and we add 
on a fully coherent final stage to verify or debunk any 
remaining candidates. We also explicitly account for the 
unknown polarization of the source, while this compli- 
cation was omitted for in BC, for simplicity. Other im- 
portant differences between our work and theirs will be 
highlighted below. 

This paper is organized as follows. Section II sets up 
notation, describes the expected signal from an isolated 
GW pulsar, and reviews the stack-slide algorithm. Out- 
general multistage strategy for searching through large 
parameter spaces for GW pulsars, using a combination 
of semi-coherent methods and coherent methods, is ex- 



2 



plained in Section III. Our general search scheme con- 
tains a fair number of free parameters (the number and 
duration of the coherently analyzed stretches in each 
semi-coherent stage, as well as the coarseness of the dis- 
crete grid used to cover the parameter space of sought- 
for signals), which we can adjust to make the search as 
efficient as possible. Our general scheme for perform- 
ing this optimization is described in Section IIIB. Sec- 
tion IV develops all the formulae we need to evaluate the 
computational cost of any of our strategies, for any de- 
sired sensitivity. More specifically, section IV A reviews 
the template-counting formulae developed in BC; sec- 
tion IV B develops the equations relating the thresholds 
that candidates must pass at different stages (to advance 
to the following stage) to the false dismissal (FD) rates 
at those stages, and hence to the overall sensitivity of the 
search; and section IV C derives estimates for the domi- 
nant computational cost of each part of the search. Sec- 
tion V describes our results: the optimal strategy (within 
our general scheme) and its sensitivity. Section VI con- 
cludes with a summary of our main results and a discus- 
sion of open issues and future work. 



II. NOTATION AND BASICS 
A. The signal from a GW pulsar 

Here we briefly review the expected GW signal from 
a spinning neutron star. Let x(t) be the output of some 
detector. In the absence of any signal, x{t) is just noise 
n(t), which wc shall assume to be a stationary, Gaussian 
stochastic process with zero mean. In the presence of a 
signal, we have 

x(t) = n{t) + h(t) (1) 

where the signal h(t) is a deterministic function of time. 
We assume that the GW pulsar is isolated and at rest 
with respect to us, so that effects due to its motion can 
be neglected. (More precisely, we assume these effects 
can absorbed into an overall Doppler shift, and so are 
unobservable.) Let i S3b be time measured in the Solar 
System Barycenter (SSB) frame. The form of h{t) in 
this frame is a constant-amplitude sinusoid with phase 
given by 

S a 

Ht seb ) = $ +2^/oAt 3Sb + 27rV J * (Ai ssb ) fe+1 (2) 

kTi ( fc + 1 ) ! 

where Ai ssb = t ssb — , with if° b being a fiducial start 
time; <&o, fo and fk are respectively the phase, frequency, 
and spin-down parameters at the start time, and s is the 
number of spin-down parameters that we search over. As- 
suming that the pulsar is isolated and emitting GWs due 
to a small deviation from axisymmctry, the waveforms 



for the two polarizations are 

h+ = i/i (l + cos 2 i)cos$(t) , (3) 
h x = h cost sin $(i) (4) 

where ho represents the angle-independent amplitude of 
the wave, i is the angle between the spin-axis of the pul- 
sar and the direction of the waves' propagation, and the 
frequency / = $/27r of the emitted GWs is equal to twice 
the rotational frequency of the star. 

Let n be the unit vector pointing from the Solar Sys- 
tem toward the pulsar, r(t) be the position of the detector 
in the SSB frame, and v(t) be its velocity with t being 
the time in the detector frame. Ignoring relativistic cor- 
rections [19], a wave reaching the Sun at time t SBb will 
reach the detector at time 

t = t, b -^. (5) 

c 

As seen from Eqs. (2) and (5), to a good approximation, 
the instantaneous frequency of the signal as seen by the 
detector is given by the familiar Doppler shift expression 

/W=/(*)(l + ^) (6) 

where f{t) is the instantaneous frequency of the signal in 
the SSB frame, and is given by 

/w = /o+E^( A u fc - a) 

Eqs. (6) and (7) describe the frequency modulation of 
the received signal. The received signal is also amplitude 
modulated by the time-changing antenna pattern of the 
detector as it is carried along by the Earth's rotation. 
The received signal h(t) is a linear combination of h + 
and h x : 

h(t) = F+ (n, il>)h+ (t) + F x (n, ij,)h x (t) (8) 

where ip is the polarization angle of the signal, and F + , x 
are the antenna pattern functions. Due to the motion of 
the Earth, the F + , x depend implicitly on time: 

F+(t) = a(t)cos2ip + b(t)sm2ip (9) 
F x (t) = b(t) cos 2ip - a(t) sin 2tp (10) 

where the functions a(t) and b(t) are independent of ip. 
(In these equations, the angle between the arms of the 
detector is taken to be 7r/2.) We refer the reader to [3] 
for explicit expressions for a(t) and b(t). 

The modulated frequency is described by the s + 3 pa- 
rameters consisting of f and A := (n, {fk}k=i ...s)i we 
shall often denote the pair (/o,A) by the boldface sym- 
bol A. Apart from the parameters A, the waveform (8) 
depends on other parameters: the pulsar's orientation t, 
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polarization angle ip, the initial phase $o, an d the am- 
plitude ho- The optimal matched filter statistic [3] for 
detecting the waveform must, in principle, search over 
the entire parameter space (A, l, tp, 3>o, h ). However, it 
turns out that the computationally challenging part of 
the search involves just the A; the optimization of over 
(i,ip,&o,h ) can be done analytically, by means of the 
JF-statistic defined in [3]. The JF-statistic is the optimal 
matched filter statistic maximized over (i,ip,^Q,ho). ft 
is therefore only a function of (/o, A) and it is given by 



^(/o,A)=4 



B\F a \ 2 



A\F b \ 2 - 2CU(F a F*) 



ATS n (fo)D 



(11) 



where S n (f) is the single-sided power spectral density of 
the detector noise n(t), and 



F a 



AT/2 

-AT/2 
AT/2 



;(i)a(i)e- 4 * (t;A) dt , 



(12) 



x{t)b{t)e-^ x) dt , (13) 

-AT/2 

A = (a\\a), B = (b\\b), (14) 

C = (a\\b), D = AB-C 2 . (15) 



Here we have used the notation 

2 r AT/2 



(x\\y) 



AT 



AT/2 



x(t)y(t)dt. 



(16) 



In cases where the amplitude modulation can be ignored 
(e.g., for short data segments, << 1 day long, where 
the a(t) and b(t) can be approximated as constant), we 
see that T is proportional to the demodulated Fourier 
transform which matches just the phase evolution: 



•Fcx |A(/,A)| 2 



where 



/AT/2 
-AT/2 



-AT/2 



(17) 



(18) 



The JF-statistic is the optimal (frequentist) detection 
statistic for GW pulsars, and it is at the core of some al- 
goriths currently used to search for GW pulsars in LIGO 
and GEO data [4]. Some important properties of the T- 
statistic are reviewed in section IV. B, and a more detailed 
description can be found in [3]. 



spectrum of each segment. For now we assume each seg- 
ment is sufficiently short that the signal frequency re- 
mains confined to a single discrete frequency bin. If there 
is a signal present, it will most likely be too weak to 
show up in a single segment with any significant signal-to- 
noise-ratio (SNR). However, we can increase the SNR by 
adding the power from the different segments. We must 
not use the the same frequency bin from each segment, 
but rather must follow the frequency evolution given by 
Eq. (6). Thus, we 'stack' the power after 'sliding' each 
segment in frequency space. Note that the sliding de- 
pends on A. Thus, in practice, we choose a grid in the 
space of A's and the sliding is done differently at each 
grid point. 

As described above, the sensitivity of the stack-slide 
algorithm is restricted due to the length of AT; we should 
not take AT to be too large, since then we would lose 
SNR due to the signal power being spread over several 
frequency bins. However, we can gather all the signal 
power back into a single bin by taking account of the 
Doppler modulation and spin-down while calculating the 
spectrum of a segment; i.e., we de- modulate each data 
segment before summing. 

With these concepts at hand, we can now describe the 
stack-slide search for the .T 7 - statistic. The strategy is very 
similar to the power summing method described earlier 
in this section. Again we break up the data of length 
T usod into N segments, each of length AT = T uscd /N. 
We choose a point A^ in parameter space, and demod- 
ulate the signal accordingly. We calculate f(f, A<j) as a 
function of the frequency for each segment and add the 
^"-statistic values after sliding the different segments in 
frequency space appropriately. 

As explained in BC, the resolution of sky- and spin- 
down-space that suffices for the demodulation is not fine 
enough for for the stack-slide step. Thus at each stage we 
two grids on parameter space: a coarse one for perform- 
ing the short-segment demodulations and a fine one for 
sliding and stacking the short-segment results. We refer 
the reader to [5] and the appendix of [6] for a detailed 
derivation of the formula relating the required amount of 
sliding to the parameters \d- 



III. A MULTISTAGE HIERARCHICAL SEARCH 



A. The general algorithm 



B. The Stack-Slide algorithm 

The stack-slide algorithm is best described with refer- 
ence to the Doppler shift formula of Eq. (6). Imagine 
we have a data stream x(t) covering an observation time 
T uaod , and we wish to search for a GW pulsar with some 
parameters A. We break up the data into N smaller seg- 
ments of length AT = T usod /N, and calculate the Fourier 



The stack-slide search algorithm described in the pre- 
vious section has two components: 1) calculation of the 
^-statistic for data stretches of length AT, 2) summation 
of the resulting T values along the appropriate tracks in 
the time-frequency plane. (If there are TV coherently ana- 
lyzed segments, then the sums have N terms.) If we had 
unlimited computational resources, we would simply do 
a fully coherent search on all the data; i.e., set N = 1 and 
take AT to be the entire observation time. However, the 
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number of templates required for a fully coherent search 
increases as a high power of AT, making this impractical 
for all-sky searches. 

To illustrate this point, consider an all-sky search for 
young, fast pulsars, i.e., GW pulsars that have a spin- 
down age as short as r min = 40 yr and that emit GWs 
with frequency up to / max = 1000 Hz. Let us assume 
that we have 30 days of data available to us. Imagine two 
different ways of looking for this pulsar: a full 30— day co- 
herent integration versus a semi-coherent method where 
the available data is broken up into 30 equal segments. 
The formula for the number of templates required for 
these searches is given below in Eq (30). It turns out 
that the full coherent search requires ~ 4.2 x 10 15 tem- 
plates if we are to not lose more than 30% of the signal 
power. On the other hand, the semi-coherent search re- 
quires only <~ 1.5 x 10 11 templates for the same allowed 
fractional loss in signal power. The ratio of the the num- 
ber of templates required for the two types of searches 
increases rapidly with the observation time; for instance, 
for an observation time of 40 days, the corresponding 
numbers are - 5.5 x 10 16 and - 8.3 x 10 11 for the full 
coherent and semi-coherent searches respectively. 

As illustrated by the above example, semi-coherent 
searches for unknown GW pulsars are a compromise 
forced upon us by limited computing power. Such 
searches will remain computationally limited for the fore- 
seeable future, so it behooves us to organize them as ef- 
ficiently as possible. In this paper we consider a class 
of multistage, hierarchical search algorithms. Since our 
main "problem" is the large volume of parameter space 
we need to search over, the basic idea behind these algo- 
rithms is to identify and discard unpromising regions of 
parameter space as fast as possible-without discarding 
real signals. The type of scheme we consider is illus- 
trated schematically in Fig. 1. The first stage is a semi- 
coherent search through some fraction of the available 
data. A threshold is set, and candidates exceeding this 
threshold are passed to the next stage. The second stage 
is similar to the first, but includes additional data and 
generally entails a finer resolution of parameter space. 
(The latter means that any candidate that survives the 
first semi-coherent stage gives rise to a little crowd of 
nearby candidates that are examined in the second semi- 
coherent stage.) Any candidate that exceeds the second- 
stage threshold is passed on to the third stage, and so 
on. In an (n + l)-stage search, any candidate surviving 
all n semi-coherent selections is subjected to a final, co- 
herent search (which we consider the (n + l) th stage); if 
the final, coherent threshold is exceeded, then a detection 
is announced. We impose as a constraint that the false 
alarm (FA) rate for the entire search must be < 1%; i.e., 
if the data is actually just noise, then the probability that 
a detection is announced must be < 1%. For reasons ex- 
plained below, in realistic examples this inequality is all 
too easy to satisfy ; the actual FA rate for our optimized 
searches is typically smaller than 1% by many orders of 
magnitude. 



Read in data 



Break up data into 
smaller segments 



Analyze each segment 
coherently 



Combine segments 
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(e.g. stack-slide or Hough etc.) 



Read in more data 



Select candidates in 
parameter space 
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Final follow-up 
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fully coherently 



Announce detection 
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FIG. 1: A hierarchical scheme for the analysis of large param- 
eter space volumes for continuous wave searches. Each step 
analyzes only those regions in parameter space that have not 
been discarded by any of the previous steps. 



In the end, our search will be able to detect a GW 
pulsar signal whose rms strength (at the detector) h RMS 
exceeds some threshold value h th (with a false dismissal 

rate of 10 15%). We can think of l/h th as the search's 

sensitivity. We will optimize our search to get the maxi- 
mum sensitivity for any given computing power or, equiv- 
alently, to find the minimum computer power necessary 
to attain any given sensitivity. 

The problem of optimizing a semi-coherent, hierarchi- 
cal search scheme for GW pulsars was first studied by 
BC. The present study builds upon the BC formalism, 
but there are also some important differences. We call 
attention to the following ones: 

1) BC consider a hierarchical search consisting of exactly 
two semi-coherent stages. In the present work, we 
consider a search consisting of an arbitrary num- 
ber of semi-coherent steps, plus a fully coherent, 
"follow-up" stage (utilizing all the available data) 
to assess the significance of any surviving candi- 
dates. The effect of the final, follow-up stage is to 
ensure that the overall false alarm rate (fixed at 
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exactly 1% by BC) is greatly reduced and, for all 
practical purposes, ceases to be a constraint. 

2) In BC's second semi-coherent stage, all the data used 

in the first-stage is reanalyzed, along with some 
"fresh" (as yet unanalyzed) data. A priori, it is not 
clear whether this strategy is more efficient than 
one in which each semi-coherent stage analyzes only 
fresh data. (E.g, the first stage analyzes 20 days of 
data and generates candidates, the second semi- 
coherent stage searches for those candidates in the 
next 50 days of data and generates a list of candi- 
dates that have still "survived" , these survivors are 
searched for again in the next 150 days of data, etc.) 
In this paper, we investigate both kinds of schemes: 
schemes where previously analyzed data is always 
recycled into subsequent stages, and schemes where 
each semi-coherent stage analyzes only fresh data. 

3) For simplicity, BC ignored the fact that the GWs have 

two possible polarizations (in effect pretending that 
the detectors measure a scalar wave) . This is a rea- 
sonable approximation when estimating the num- 
ber of grid-points needed to cover the parameter 
space, but not, say, when trying to estimate the 
FA and FD rates as a function of the threshold 
at some intermediate stage in the search. (Roughly 
speaking, scalar waves with the same matched-filter 
SNR would be easier to detect than actual GWs, 
since with GWs the full SNR is "split" between 
the two polarizations, in a way that is unknown a 
priori.) In this paper we aim to make realistic es- 
timates of a GW pulsar's detectability for a given 
matched-filter SNR (and given region of parameter 
space to be searched over), so we take polarization 
into account wherever it makes a significant differ- 
ence. In practice, this just means that we use the 
^"-statistic, Eq. (11), as our detection statistic. 

4) When estimating computational costs, BC assume 

that the demodulations will be done using stro- 
boscope re- sampling, a method modeled closely 
on the FFT algorithm. A different demodulation 
method, which we shall refer to as the SFT method, 
is currently being used by the GW pulsar search 
codes in the LIGO Scientific Collaboration (LSC) 
software library [7]. The SFT method takes as its 
input a short FFT database (FFT'ed sets of short- 
time data stretches) , and can be more efficient than 
stroboscopic re-sampling in cases where only a nar- 
row frequency range of the demodulated time series 
is of interest. In this paper we explore the possi- 
bility of using different demodulation methods at 
different stages of the search, and attempt to find 
the most efficient combination. 

All the above points will be elaborated on in later sections 
of the paper. 



B. The general optimization scheme 

In this section we further discuss our search algorithm 
and its optimization. First we establish some notation. 
Let n be the total number of semi-coherent stages. Let 
iV"W be the number of stacks used in the i th stage and 
ATM be the length of each stack; the superscript i will 
always refer to the i th semi-coherent stage. The resolu- 
tion of the template grid used to cover parameter space 
is given in terms of the maximum fractional mismatch in 
signal power pM x [8] . Our detection statistic is pW ; the 
sum of the T values from the different stacks (obtained 
after sliding appropriately): 

P« = £*f- (19) 

fe=i 

Denote the distribution of pW in the absence of any signal 
by p(p^). In the presence of a GW signal of amplitude 
h RMS , let the distribution of pW at the gridpoint near- 
est the actual signal be p(pW|/i RMS ,pWj. Let be the 
i th -stage threshold, which a candidate must exceed to ad- 
vance to the next stage, that we use to reject candidates 
or advance them to the next stage. The i th -stage FA rate 
(per candidate) and FD rate (per candidate) (3^ are 
given by 

p(p)dp, (20) 

• th 

{z) (ptt;h RMS ,p±) = / p(p\h RMS ,^l)dp.(2l) 
Jo 

Practically identical formulae apply to the final, coherent 
stage as well. 

Again, we require of any search algorithm that, at 
the very end of the search, it results in a false detec- 
tion less than 1% of the time. Given this constraint, we 
parametrize the search's sensitivity by the signal ampli- 
tude hth such that an embedded signal with h RMS > hth 
would be detected ;> 85 — 90% of the time. We enforce 
the latter condition as follows. We set the first-stage 
threshold h th such that a signal of amplitude h RMS = h th 
will pass to the second stage 90% of the time. At all sub- 
sequent stages we set the threshold such that the same 
signal with strength h RMS = h th has a 99% chance of 
passing to the next higher stage. That is, we adjust the 
the i th -stage threshold p$ so that @W = 0.10, while 
/?W = 0.01 for % > 1, and (i^ = 0.01 as well. The mo- 
tivation behind making (3^ lower than /?W for i > 1 is 
the following: We believe that a computationally efficient 
algorithm will have the property that a true signal that 
is strong enough to pass the first-stage threshold should 
generally pass over all the others. Any source that is not 
sufficiently strong to make it through to the end of the 
detection pipeline should be discarded as soon as possi- 
ble, so as not to waste computing power. This reflects 
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the basic idea behind our hierarchical searches: to elim- 
inate unpromising regions of parameter space as quickly 
as possible, so that computational resources can be fo- 
cused on the more promising regions. Basically, the first- 
stage threshold determines the sensitivity of the whole 
search, and subsequent steps whittle down the number 
of candidates (i.e., the number of small patches in pa- 
rameter space that perhaps contain a true signal) until 
any remaining patches can be fully, coherently analyzed. 

To fully specify our search algorithm, we have to choose 
the parameters T = (n, {N^}, {AT«}, {fi^U, 
where n is the number of semi-coherent stages and i = 
1, ...,n. In doing so, we are subject to certain require- 
ments or constraints: 

• The total amount of data available is no more that 
some T max (say, 1 year). 

• We wish to detect (with ~ 90% FD rate and < 1% 
overall FA rate) any unknown signal of amplitude 
h RMS greater than h th (say, 1(T 26 ). 

Our task is to choose the parameters T that minimize 
the total required computational power P, subject to 
the above constraints. We arrive at a cost function 
P(h t h), the computational cost of reaching any given 
sensitivity level. (Really, P is function of the product 
hf h T maK /S„(f), but we are regarding T max and S n (f) as 
fixed.) We can immediately invert this function to de- 
termine hth(P), the sensitivity achievable for any given 
computing power. 

Let us first deal with the constraint on the total 
amount of data. We are going to consider simultane- 
ously two different modes of all-sky searches. In "data 
recycling mode," at each stage we start back at the be- 
ginning of the data, but take progressively larger values 
of iV (i) AT (i) . Thus the first stage looks at data in the 
interval [T ,T + N^AT^], the second stage looks at 
[T , T + N^AT^} and so on. The total observation 
time is thus 



(22) 



In "fresh-data" mode, rather than always starting over 
from the beginning, we analyze fresh data at each 
stage. The first stage looks at data in the range 
[T , T + 7VWATW], the second stage looks at [T + 
jV(i)AT«,T + TVWATW + N^AT^}, etc. The to- 
tal observation time is thus 



stage is expected to be much more sensitive than any 
of the preceding steps; therefore the overall FA rate is 
essentially set by the final stage threshold alone. (The 
earlier stages serve only to whittle down the number 
of candidates, N co h, that are analyzed in the final co- 
herent stage.) If the threshold in the final follow-up 

stage is p[t° h ^ , then the overall FA rate is no larger than 
a^ coh \p[^ oh ^) times the number of effectively indepen- 
dent candidates in parameter space. We approximate 
the latter, crudely, by <~ -/V p (T max , 0.2, 1); in practice 

a ( coh ) (p[^ oh ') turns out to be so minuscule that the crude- 
ness of this approximation is irrelevant. 

The overall false dismissal requirement is also easily 
handled. Let (i be the total false dismissal rate of the 



multistage search. Each stage has its own threshold p 



(0 

th 

and corresponding false dismissal rate (3^ . If each stage, 
including the follow-up stage, were to analyze completely 
independent data, we would have 



n+l 



= 1 - Y[ (1 - /3 W ) w + • • • (3 {n+1) . 



(24) 



(where we use interchangeably with (3^ coh ^). In 

our fresh-data search mode, the data at different stages 
are independent, except for the final, follow-up stage. 
And in our recycled-data scheme, the data examined at 
higher stages includes all the data examined in earlier 
stages. Then when (3^ = 0.1 and 



0(2) = ^O) 



it is clear that (3 is roughly in the range (10 + n — 1)% to 
(10 + n)% , for fresh- datamodeandl0% to(10+n)% for 
recycled-data mode. Since n w 3 turns out to be optimal 
(see below), we crudely summarize this by saying that 
our strategies have an overall FD rate of 10 — 15% at the 
threshold value of h RMS . 

Finally, we turn to the search's computational cost, 
which we wish to minimize. Let us denote the total num- 
ber of floating point operations for the i th semi-coherent 
stage by and for the final coherent stage by C^ oh \ 
Expressions for and C^ ^ are given in the next sec- 
tion. For now, it is sufficient to say the total computa- 
tional cost is 



c total = (j2 c ' il) ) + c ' {coh) > 



(26) 



T used =^7VWATW 



(23) 



In either data-recycling or fresh-data mode, one con- 
straint is that T usod < T max where T max is the total amount 
of data available. Also, in either mode, at each stage we 
look only at portions of parameter space that exceeded 
the threshold set at the previous stage. 

Next we consider our constraints on the overall FA and 
FD rates for the pipeline. The final, coherent follow-up 



and that if we wish to analyze the data in roughly real 
time, the required computational power (operations per 
unit time) is 



T, 



total 



(27) 



Depending on which mode we are working in, T usod is 
given by Eq. (22) or Eq. (23). 

Again, our strategy for optimizing the search is to min- 
imize P, subject to the constraints listed above. 
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IV. TEMPLATE COUNTING, CONFIDENCE 
LEVELS, AND COMPUTATIONAL COST 

A. Template counting formulae 

This section gives the template counting formulae orig- 
inally derived by BC using the metric formulation devel- 
oped in [8]. 

For simplicity, the parameter space is covered by 
spheres of proper radius ^//i max (^i max is the maximum al- 
lowed fractional mismatch in the detection statistic [8]) 
using a cubic grid. However it is worth keeping in mind 
that a cubic grid over-estimates the number of required 
templates even in two dimensions, and the difference in- 
creases rapidly with the dimensionality [9]. 

As mentioned earlier, for each semi-coherent stage, we 
have a coarse grid for the demodulation and a fine grid 
for the stack-slide analysis. Following BC, for simplicity 
we shall require that at any given semi-coherent stage, 
the maximal mismatch ^ max for the fine grid is the same 
as /i max for the coarse one. However (unlike BC), we allow 
Mmox to vary from one stage to the next. 

The number of templates (or gridpoints) N p is a func- 
tion of the mismatch /Li max , the coherent time baseline 
AT, and the number of stacks N (which is unity for the 
coarse grid). BC have derived the following expressions 
for the number of gridpoints, N pc and N p f, in the coarse 
and fine grids, respectively: 



N pc = iV p (AT, Mmax ,l), 
N pf = iV p (AT, Mmax ,7V). 

where N p is given in Eq. (2.22) of BC: 



(28) 
(29) 



N„ 



max 

sG{0,l,2,3} 



MsJVsGs]] (l 



fe=0 



0.3rft fe+ M 



Here r = 1 AU is Earth's orbital radius, ft = 27r/(lyr), 
s s/2 /^ ax AT s < s+3 )/ 2 



M s 



(* + 2) s/2 (^/s) s/2 t± +1)/2 ' 



(30) 
r), 

(31) 



M s = ( 



/ m «> | 2 (a + 2) 
lHz j 4/z max 



where 



A = 0.014 ,B = 0.046 



1 



AT 



1 



1 

CP 



-1/2 



(32) 



C = 0.18 



AT 



1 day / \ 1 day, 

(33) 

and the functions G s are given in Appendix A of BC. 
Roughly speaking, the factor M. s counts distinct patches 
on the sky as set by the Earth's one-day spin pe- 
riod, M s counts distinct "patches" in the space of spin- 
down parameters, the G s give the dependence of N p on 
the number of stacks, N, and the factors of the form 



^1 + 0,3 J^^" min ^ effectively account for the increase of 

search volume required when the frequency derivative 
d k f/dt k is dominated by the Dopplcr shift from the 
Earth's motion around the Sun rather than by the pul- 
sar's intrinsic spin-down. In our numerical work we use 
the full expressions for the G s given in the Appendix A 
of BC, but for completeness we note that BC also give 
the following approximate fits to the G s , which are valid 
when N > 4: 



Go (AO 
Gi(iV) 
G 2 (N) 
G 3 (N) 



1, 

0.524N, 
0.07087V 3 , 
0.00243iV 6 . 



(34) 
(35) 
(36) 
(37) 



The N p results in BC were derived under the assumption 
that the observation time is significantly less than one 
year. As we shall see below, in the cases where the to- 
tal available data covers an observation time of a year or 
more, it turns out that for the optimal search, the initial 
semi-coherent stages typically analyze a few days' to a 
few months' worth of data. Also, most of the search's 
computational cost is spent on these early stages. (This 
is especially true for the young-pulsar search, which is the 
most computationally challenging.) Therefore, it seems 
reasonable for our purposes to simply use the N p for- 
mulae from BC for all observation times. Since the cost- 
errors we make by using the BS formulae will be confined 
to the later stages, and since the overall sensitivity of the 
search is effectively set at the first stage, we believe these 
errors will not significantly affect the total computational 
cost, for fixed threshold (though they may affect the rel- 
ative allocation of resources between the different stage). 
Of course, the validity of this assumption can only really 
be checked by re-doing the calculation using more accu- 
rate expressions for the N p 's, appropriate for year-long 
observation times, but unfortunately such expressions are 
not currently available. 

Even for short observation times, the N p calculation 
in BC used the approximation (17), which neglects the 
amplitude modulation of the signal; however this approx- 
imation is not expected to cause significant errors in es- 
timating template numbers. 



B. False dismissal rates and the thresholds 

In this subsection, we discuss the statistical proper- 
ties of the stack-slide search and solve the false dismissal 
constraint to obtain expressions for the thresholds. 

It is shown in [3] that the distribution of the ^-statistic 
(or to be more precise, for each coherent search, 

is given by a non-central \ 2 distribution. The non- 
centrality parameter r\ is given in terms of the signal h(t) 
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by: 



= i- 



2hl 



Sn(f) 

.AT 



df 



Sn(f) 



(38) 



where h(f) is the Fourier transform of h(t). We have in- 
cluded a fitting factor of 1 — ^ max /3 to account for the av- 
erage loss in power due to the mismatch between the sig- 
nal and template. h RMS is the root-mean-square value of 
the signal h(t). We can relate h RMS to the amplitude h RMS 
defined in Eqs. (3) and (4), as follows. If one averages 
/i EMS over all sky-positions as well as over the polarization 
parameters i and ip, one obtains < ^ MS >= (2/25)/iq 
(see Eq. (93) of [3]). 

More explicitly, the distribution is 



p(JF\rj) = 2 X 2 (2T\ V ,4) 



n 



1/2 



■ n /2 



(39) 



where X 2 ('\ 7 1> 1 ') is the x 2 distribution with v degrees of 
freedom and non-centrality parameter -q, and I\ is the 
modified Bessel function of first order. The statistic p of 
interest for the stack-slide search is the sum of the T- 
statistic over N stacks. Assuming the ^-statistic for the 
N stacks to be statistically independent, 2p must follow 
a x 2 distribution with 4iV degrees of freedom and non- 
centrality parameter Nrj 



p(p\r 1 ,N)=2x 2 (2p\Nr]AN). 



(40) 



The mean and variance of p are given respectively by 



2N+- 



a; = 2N + Nrj. 



(41) 



Using the distribution p(p^), the false alarm rate for 
the i th semi-coherent stage (defined in Eq. (20)) can be 
evaluated analytically: 



2JV<*'-1 

£ 

k=0 



k\ 



(42) 



As discussed earlier, the overall false alarm probability a 
for the search is set by the final coherent follow-up stage. 
For this stage, N — 1 so that if the threshold on p is 
p(c° h ) ^ then j s eaS y to see from the previous equation 
that: 



a 



(1 



•Pth )e Pth 



(43) 



In the presence of a signal, the non-central x 2 distribution 
for p is a little cumbersome to work with, and it is useful 
to replace it by a Gaussian with the appropriate mean 
and variance. So we say that the distribution of p must 



be approximately Gaussian with mean and variance as 
in eq. (41): 



p(p\f],N) 



1 



-(p-p) 2 /2a 2 p 



2^2 



(44) 



This approximation is not valid when N is of order unity. 
Then for any given h th , we should set the threshold of the 
i th stage, pW by the false dismissal requirement: 



p(p\r&\N)dp 



/3« 



where 



vtt ■= I 1 



3 



2/i t 2 h AT« 
SJJ) 



(45) 



(46) 



Here the factor of 1 — /U^ x /3 accounts for the average loss 
in power due to the mismatch between the signal param- 
eters and nearest gridpoint parameters. Eq. (45) can be 
solved to find as a function of /i th , AT^\ and pS l \ 
Or equivalently, it gives h th = AT«, p^). This 

equation can easily be solved by using the properties of 
the complementary error function. By changing variables 
in the integral, we can rewrite the false dismissal rate as 



/3« = ierfc 
2 



1« 



(i) 



(47) 



If h th is the smallest value of h RM s for which the false 
dismissal rate is no bigger than /?W , then we have 



P (i) (V) - P W - V2 ( j«crfc- 1 (2/3( 4 



2N il 



(i) 



- 2crfc- 1 (2/3^)y7VWyi + ^. (48) 

In practice, we fix one value of h th (our sensitivity goal) 
for an an entire search, and we then set the threshold pW 
at each stage by solving Eq. (48), with the false dismissal 
rates set by /?W = 0.1 and (3^ = (3 coh = 0.01 for i > 2. 
Our rationale for this choice is as follows. At each stage, 
one can estimate the signal strength of any successful 
candidate. If after the first stage, one can already predict 
that a candidate is not strong enough to pass over the 
threshold at the second or a higher stage, then one might 
as well discard it immediately and so not waste computer 
power on a likely failure. Put the other way, an efficient 
algorithm should ensure that a true signal that is strong 
enough to pass over the first stage is also strong enough to 
pass over all subsequent stages. Then the false dismissal 
rate for the whole search will be only a little larger than 
the FD rate of the first stage alone, or a little more than 
10%. (An overestimate of the total FD rate is the sum 
of the rates for each of the stages, or 13% for a 3-stage 
search.) 
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C. Computational Cost 

Let us begin with the first semi-coherent stage. Here, 
the number of points in the coarse and fine grids are 
respectively 



JV£> = N p (AT^,^l,l), 

N p (AT^,^l,N^) 



N 



(i) 
pf 



(49) 
(50) 



If we are searching in a frequency range from small 
frequencies up to / max , the data must be sampled in the 
time domain (at least) at the Nyquist frequency 2/ max . 
The minimum number of data points that we must 
start out with in the time domain is then 2/ max AT. 
To calculate the ^-statistic for each stack, we need to 
first calculate the quantities F a and Fb which appear in 
equation (11). We describe two methods below which 
may be called the stroboscopic resampling method and 
the SFT method. Given F a and Fb, the cost of combining 
them to get T is negligible. 

The stroboscopic resampling method: The method sug- 
gested in [3] (and also in [2]) is based on the observation 
that the integrals in Eqs. (12) and (13) look almost like a 
Fourier transform; the difference being the form of <f>(t) 
in the exponential. However, by suitably resampling the 
time scries, effectively redefining the time variable so that 
the spectrum of a real signal would look like a spike in 
a single frequency bin, the integral can be written as a 
Fourier transform and we can then use the FFT algo- 
rithm. Since the cost of calculating an FFT for a time 
series containing m data points is 3m log 2 m, the oper- 
ations cost of calculating the JF-statistic for each stack 
should be approximately 12/ max ATlog 2 (2/ max AT). Re- 
peating this for stacks and for each point in the 
coarse grid, we see that the total cost of calculating F a 
and Fb, and therefore the .T 7 - statistic, is approximately 

12jV«jV«/ max AT« log 2 (2.f max AT«) . (51) 

We now need to appropriately slide each segment in fre- 
quency space and stack them up, i.e. add the ./-"-statistic 
values from each stack to get our final statistic p. This 
has to be done for each point in the fine grid. The cost 
of sliding is negligible and we need only consider the cost 
of adding the ^"-statistic values. Since adding A^ 1 ) real 
numbers requires N^> — 1 floating point operations, we 
see that the cost of stacking and sliding for all frequency 
bins and for all points in the fine grid is approximately 



/^atWa^atW-i). 



(52) 



Thus, the computational cost for the first semi-coherent 
stage is 



= f m 

res J" 1 



N., 



N, 



AT (i)Ar(i) 
(i) 

g-(iV« - 1) 



12iV« 



log(2/ max AT( 1 )) 
log 2 



pc 



(53) 



The subscript rc a indicates that this result is for the 
stroboscopic resampling method. 

The SFT method: An alternative method is to use as in- 
put not the time series, but rather a bank of short time 
baseline Fourier Transforms (SFTs). This is in fact the 
method currently being used in the search codes of the 
LIGO Scientific Collaboration [7]. Here one first breaks 
up the data into short segments of length T sft , and cal- 
culates the Fourier transform of each segment. (These 
segments, which are to be combined coherently, arc not 
to be confused with the segments used in the stack-slide 
algorithm which are combined incoherently). T sft should 
be short enough so that the signal does not drift by more 
than half a frequency bin over this time. Typical values 
of T sft are 1800s. The exact method of calculating the T- 
statistic from an SFT database is sketched in Appendix 
A, and the operations count is also derived there. The 
result is (see Eq. (A12)): 



(AT«) 5 



Flops . 



(54) 



Note that the SFT method of calculating the ^-statistic 
is 0((AT^) 2 ) while for the stroboscopic resampling 
method it is ©(ATW log AT^). 

The total cost of stacking and sliding in the first hier- 
archical stage using the SFT method is thus: 



a 



(i) 



AT«AT« 

x pc 



640Ar«ATW 



+ 



(i) 
pf 



N, 



(i) 



(A^ (1) - 1) 



pc 



(55) 



When all frequencies are to be searched over, strobo- 
scopic resampling produces the ^"-statistic about an or- 
der of magnitude more cheaply than the SFT method, for 
typical values of AT^\ However when previous stages 
have narrowed the search to a small fraction of the whole 
frequency band (for any given A), the SFT method can 
be the more efficient one. We should also mention here 
that it is possible to start with SFTs and combine them 
in such a way as to get a 0(AT^ log AT^) operations 
count; this is in fact the method used in [10]. However, 
in this paper, by the "SFT method" we always mean the 
method described here in Appendix B, with the opera- 
tion count given above in Eq. (55). 

It also seems likely that the resampling method could 
be modified so as to be the most efficient one, even when 
only wanted to demodulate a small frequency band 



A/ = max < 1 



AT( { - r 



(56) 



around every selected candidate. Presumably the first 
step would be to heterodyne the data to shift the relevant 
frequency range to a neighborhood of zero-frequency 
Then one would filter out frequencies higher than A/, 



10 



followed by the usual demodulation. Eq. (60) would then 
be modified, so that the new cost of demodulating would 
be 12AT«AT«A/log 2 (2AT«A/). However since the 
details of this modified demodulation method have not 
yet been worked out, we will not consider it further in 
this paper. 

This completes our analysis of the first stage compu- 
tational costs for both methods. The analysis for the 
subsequent stages proceeds similarly; the only difference 
is that subsequent stages analyze only those regions of 
parameter space that have not been discarded by any of 
the previous stages. Assuming that almost all the can- 
didates are due to noise, the false alarm rate is a good 
estimate of the number of candidates produced by any 
stage. Let us denote by FW the number of candidates 
which survive the i th stage. Since the false alarm rate for 
the first stage is , the number of candidates produced 
by the first stage is given by 

F« = maxjlJ^ATWivW^ 1 )} . (57) 

Note that we will always have at least one candidate 
which makes it through to the next stage. To calculate 
the cost of a search, we of course must make some as- 
sumptions about the data to be processed. Basically, we 
are assuming that the data consists of Gaussian noise plus 
one detectable source. (Though we call F« the Ho- 
stage false alarm rate" , it is really the "false alarm rate or 
the true-source survival rate, whichever dominates" . In 
practice, until the last semi-coherent stage, the FA rate 
always dominates.) 

To estimate the computational cost for the i th stage, 
for i > 1, recall that each of the F^ -1 ) candidates pro- 
duced by the (i — l) th stage is in fact a region in pa- 
rameter space. If we assume that the i th stage further 
refines this region, then we see that the number of i th - 
stage coarse grid points in this region must be, on aver- 
age, NpJ /NpjT 1 ^ (again, assuming this ratio to be big- 
ger than 1). Thus, using the stroboscopic resampling 
method, the number of floating point operations to cal- 
culate the ^-statistic in the i th stage is 



ir(i-i) 



max < 1, 



N, 



N. 



(i-i) 



pf 



12/ max AT w Ar (4) log 2 (2/ max AT w 

(58) 

Each candidate produced by the (i — l) th stage occupies 
a frequency band l/AT^* -1 ), and thus corresponds to 
ATW/AT^" 1 ) i th -st&ge frequency bins. Thus the oper- 
ations count for the stacking and sliding is 



F^'Vax 1 



AT^- 1 ) 



max < 1, 



N, 



(i) 



N, 



(i-i) 



Pf 



N (i) 

(JVW-1) 



N, 



(59) 



pc 



floating point operations. Combining these results, we 



get the computational cost for the i stage (i > 2): 



C (i) 



p(i-i) 



max < 1 , 



N, 



(i) 



jyr(i-l) 



l2N u )f A (rt log(2/ M .ArW) 
log 2 



(60) 



+ max < 1 



AT« I N, 



(i) 



(jyW - 1) 



pc 



If instead one uses the SFT method for calculating the 
.T 7 - statistic, it is easy to see the operations count is 



L'sft 



Fornax jl, 










T s f t 


N {1) 

Jv pc 



max < 1 



AT^" 1 ) 



(jyW - l) 



(61) 



After the n semi-coherent steps, we have the final co- 
herent follow-up stage where the entire stretch of data of 
duration T usod is used. For this stage, we analyze 
candidates and simply compute the JF-statistic without 
breaking up the data into any smaller stacks. The cost 

C {coh) for thig 

using the resampling method is 



c (coh) =F («) max J ! 

res 1 ' 



N, 



(coh) 



log(2/ max T UBed ) 
log 2 



(62) 

where N^ h = N p (T UBCd , [i co h, 1), and fj, co h is the 

of the final, coherent stage. Using the SFT method, we 

would have 



Cr h) = Fornax <h. 



(coh) 



N 



(n) 



max < 1 



Pf 



' ATM 



640T U3Cd 
T e[t 
(63) 

So far, all results in this section are valid whether we 
are working in fresh-data mode or data-recycling mode. 
The following formulae, for the number of candidates 
which survive a given stage, do however depend on which 
mode we are working in. If we operate in fresh-data mode 
(analyzing fresh data at every stage-except the last stage, 
which is a coherent follow-up of all the searched data), 
we clearly have (for i > 2) 



o/^max -i F 



x max 




Again, our count assumes that at least one candidate 
gets "promoted" to the succeeding stage. We note that 
Eq. (64) assumes that the parameter space resolution im- 
proves at every stage of fresh-data mode (which seems 
always to be true for our optimized searches). We also 
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note that Eq. (64) is basically identical to Eq. (5.2) of 
BC, but there it is claimed to be the FA rate for data- 
recycling mode. That is not correct, in general, as we 
discuss further below. 

If we are in data-recycling mode (at each step, re- 
analyzing old data, while also adding on new data), then 
the probabilities of a candidate's randomly surviving the 
(i — l) th and i th stages are not independent, and so 
Eq. (64) is no longer valid. (To see this, consider the 
limit where only a very tiny bit of data is added on, and 
the resolution is kept fixed. Then any candidate that 
survives the (i — l) th stage has almost a 100% chance of 
surviving the i th stage, even if c*W is extremely small.) 
Indeed, the rhs of Eq. (64) is clearly a lower bound on 
the i th -st&ge false alarm rate, in data-recycling mode. 

We can also place the following upper bound on on 

F {i) 

for data-recycling mode: 



r — Jn 



<AT«iV$a«. 



(65) 



The rhs of (65) is the number of false alarms 
that would result if one performed a semi-coherent 
search of the entire parameter space with the given 
(JVW, ATW,^W,pW), while the lhs is the false alarms 
that result from searching only neighborhoods of the 
points the survived the (i — l) th stage. Thus for data- 
recycling mode, we can say that is somewhere in the 
range 



jr(<-i) 



N (i-i) AT^" 1 ) 



< / max AT«A^a« .(66) 

Fortunately, when we calculate the total computational 
cost of some optimized search in data-recycling mode, 
needed to achieve some given sensitivity h t h, if we try 
plugging in either the upper or lower bound for F^ l \ we 
find the two final results differ from each other by <J 18% 
for a young palsar (r TO j„ = 40 year) and <J 5% for an 
old one (r m i n = 10 6 year), which for our purposes is 
practically insignificant. Moreover, the optimized search 
parameters obtained when we plug in the upper-limit es- 
timate for i^W are quite similar to those we find by plug- 
ging in the lower limit instead. Therefore it is safe for us 
to choose either the upper or lower limit as an estimate 
of FW. For concreteness, in the rest of this paper we 
always estimate by its upper limit, which slightly 
overestimates the computational cost of the search. 

With these results in hand, we are now ready to cal- 
culate the total computational cost of the entire search 
pipeline. We have a number of choices to make. At 
each stage, we can use either the stroboscopic resam- 
pling method or the SFT method in each stage, and we 
can work in either the data-recycling mode or fresh-data 
mode from the second stage onwards. For convenience, 
we somewhat arbitrarily limit the choices by considering 
only strategies that use either data-recycling mode in ev- 
ery stage or fresh-data mode in every stage. As we shall 



see below, the efficiencies of these two sorts of searches 
turn out to be extremely close anyway. Therefore we 
strongly suspect that more general searches (using fresh- 
data mode in some stages and data-recycling mode in 
others) would not give significant improvements. 



V. RESULTS 

A. The optimization method 

We next describe our numerical optimization method. 
The function we want to minimize, the computational 
power of Eq. (27) , is a complicated function on a large- 
dimensional space. Our chosen method is a simulated 
annealing algorithm [11, 12] based on the downhill sim- 
plex method of Nelder and Mead [13]. The downhill sim- 
plex method consists of evaluating the function on the 
vertices of a simplex and moving the simplex downhill 
and shrinking it until the desired accuracy is reached. 
The motion of the simplex consists of a prescribed set of 
"moves" which could be either an expansion of the sim- 
plex, a reflection around a face, or a contraction. This 
method is turned into a simulated annealing method by 
adding a random fluctuation to the values of the function 
to be minimized, at the points of the simplex. The tem- 
perature of the random fluctuations is reduced appropri- 
ately, or in other words "annealed", until the minimum 
is found. 

There are no universal choices for the rate of annealing 
or the starting point of the simplex; these depend on the 
particular problem at hand. For the results presented 
below, we have used a variety of different starting points 
and annealing schedules to convince ourselves that the 
optimization algorithm has converged and that we have 
indeed found the best minimum. Let us first discuss the 
starting temperature, whose meaning is as follows. If / is 
the the function to be minimized, then the temperature O 
parametrizes the amplitude of random fluctuations / — > 
/ + Sf added to / at the points of the simplex: 



5f = — logr 



(67) 



where < r < 1 is a uniformly distributed random num- 
ber. A simplex move is always accepted if it takes the 
simplex downhill, but an uphill step may also be accepted 
due to these random fluctuations. In our case, we found 
that a starting temperature of ~ 10 6 -10 9 gives good 
convergence; this value is to be compared to the typical 
value <~ 10 13 of the computational cost near its minimum 
for most of the results presented below. We allow a maxi- 
mum of 500 iterations of the simplex. If the simplex does 
not converge within 500 iterations, we reduce the temper- 
ature by 2 — 5% and restart the iterations from the best 
minimum found up to that point. These steps are re- 
peated until the simplex converges. The starting point 
of the simplex cannot be chosen arbitrarily, and for this 
purpose, it is useful to have a rough idea of the location 
of the minimum. This requires some experimenting with 
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a sufficiently broad range of starting points; this is espe- 
cially important when the number of variables is large, 
as is the case for, say, a search with n > 3. Having found 
a suitable starting point for one set of pulsar parame- 
ters (/ max and T min ), it can be reused for nearby pulsar 
parameter values. Occasionally, to obtain convergence 
to a minimum, we have taken as our starting point the 
minimum we found for nearby pulsar-parameter values. 

We next describe how we impose the constraint that 
the total amount of analyzed data is less than T max . One 
could imagine trying to do this using the method of La- 
grange multipliers. However this seemed difficult to im- 
plement numerically (for our highly non-linear function 
P), and we found a simpler approach that suffices. The 
function our algorithm minimizes is not the total compu- 
tational power P (defined in Eq. (27)) itself, but rather 



f = Px 



1 + 5 



where S(x) is a smooth function such that S(x) = for 
< x < 1 but S(x) is rapidly increasing exponential 
function for x > 1. That is, we impose a very steep 
penalty for leaving the constraint surface. This works 
well, and indeed we found it useful to impose some addi- 
tional (intuitively obvious) constraints in this way, such 
as requiring the and ATW to all be positive; we 
again multiply P by factor that is unity when the con- 
straint is satisfied but is very large when the constraint 
is violated. This "trick" is used to find the location of 
the minimum, but of course the results we report are the 
values of the function P there, not /. 

There is one additional technical detail, namely, that 
our optimization method is meant for the case of contin- 
uous, real variables, while our variables are strictly 
integers. We handle this by rounding off to the 
nearest integer while calculating the cost function /, ev- 
ery time it is called. The downhill simplex algorithm still 
treats iVW as a continuous variable, i.e. we allow arbi- 
trarily small changes to iVM when the simplex is moving 
downhill, but such changes have no effect on /. We have 
also tried an alternative approach where iVW is kept as 
a continuous variable throughout, and rounded off only 
at the very end. We have found that the two approaches 
yield consistent results. 

Finally, we cross-checked our results using two different 
implementations of the simulated annealing algorithm 
those of [14] and [15]-and found that they gave basically 
equivalent results in our case. 



B. The number of semi-coherent stages 

The first question we want to answer is: what is the 
optimum number n of semi-coherent stages to use in the 
search? Relatedly, we want to know the most efficient 
method to use for the JF-statistic calculation (strobo- 
scope resampling or SFT method) and best mode to 
work in (fresh-data mode or data- recycling mode). To 



answer this, we consider an all-sky search for fast/young 
GW pulsars, by which we mean a search that goes up 
to frequency / max = 1000 Hz and that can detect pulsars 
with spindown ages r min as short as 40 yr. We assume the 
amount of data available is T max = 1 yr, and ask what is 
the computational power required to detect pulsar signals 
whose h RM s is or above hth, given by: 



h 2 th 

Sn(f) 



2.5 x 10"°sec" 



(69) 



This signal strength corresponds to ^Jr\ k, 39.72 for a full 
1-yr observation time with a perfectly matched template. 
(Here and below we are implicitly assuming that S n (f) 
hardly varies over the frequency range of the signal.) We 
choose the i* h -stage FD rates (3^ as given in (and just 
above) Eq. (25), which, along with the detection thresh- 
old given by Eq. (69), determines the i th -st&ge thresholds 
(68) For S i m pii c ity ; W e set ^ coh ^ = 0.2. While this is a 



restriction that we simply put in by hand (to slightly re- 
duce the space of search parameters to be optimized), 
we believe this choice has very little effect on the over- 
all optimized strategy because, as we shall see shortly, 
the follow-up stage usually accounts for only a tiny frac- 
tion of the total computational cost. [20] Thus we are left 
with 3n parameters to be optimized: (ATW^M^iVW) 
for i = 1, . . . ,n, subject to the constraint that the to- 
tal amount of data analyzed, T use d [given by Eq. (22) or 
(23)] is less than 1 yr. 

Plots of the minimum computational cost for different 
n and for both the data-recycling and fresh-data modes 
are shown in Fig. 2. For each mode, we consider the 
following three strategies: (i) Use the SFT method in 
each stage, (ii) Use the resampling method in each stage, 
and (iii) Use the resampling method in the first and final 
follow-up stages, and use the SFT method in all interme- 
diate stages. Therefore there are 6 curves in Fig. 2. 

The most important lessons from Fig. 2 are the fol- 
lowing: Strategy (iii) turns out to be better than (i) or 
(ii). Furthermore, for strategy (iii), there is a signifi- 
cant advantage in a three-stage search as compared to a 
two-stage or single-stage search, but there is hardly any 
improvement in computational cost in going to four or 
more semi-coherent stages. Furthermore, these results 
arc the same whether we use the fresh-data mode or 
data-recycling mode, and these two modes give very sim- 
ilar total costs. While Fig. 2 presents results just for 
young/fast pulsar searches, we find the same basic pat- 
tern for old pulsars, with r m , n <~ 10 6 yr: strategy (ii) is 
the most efficient for calculating the ^-statistic, data- 
recycling mode and fresh-data mode are almost equally 
efficient, and having three semi-coherent stages is near- 
optimal (significantly better than two stages, and prac- 
tically as good as four). The main difference from the 
young/fast pulsar case is that the gain in going from 2 to 
3 stages is now only a factor ~ 2 in computational power, 
i.e., smaller but still significant. 

In the light of these results, in the rest of this sec- 
tion, we consider only three-stage searches, with the first 
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Number of Stages 



FIG. 2: Computational power versus number of semi-coherent 
stages for different methods of calculating the .^-statistic. 
RES indicates the stroboscopic resampling method (strategy 
(ii)) and SFT is the SFT method (strategy (i)). SFT+RES 
corresponds the mixture of these two methods (strategy (iii)). 
For each strategy, solid lines indicate the result for the fresh- 
data mode, while the dashed lines are for the data-recycling 
mode. 



stage and final follow-up stages employing the resampling 
method and with the second and third stages employing 
the SFT method. We continue to report results for both 
data-recycling mode and fresh-data mode. 



C. The optimal three-stage search parameters 

For the example search described above, (i.e. / max = 
1000 Hz, r ml „ = 40 yr, T max = 1 yr, and h\JS n = 
2.5 x 10 _5 sec _1 ) we list the optimal search parameters for 
the three-stage search in data-recycling mode in Table I. 
The first two stages analyze about 26 days (divided in 10 
segments) and 42 days (divided in 12 segments) of data, 
respectively, while the third stage analyzes the entire 
year-long data stretch (divided in 8 segments). The total 
computational cost is 40.2TFlops. The cost breakdown 
among the individual stages, and further cost breakdown 
into the demodulation piece Cj£ h and the stack-slide Cs % ) 
piece in each stage, are given in Table II. There we give 
the total count of floating point operations required, not 
the number of operations per second. 

Our results for fresh-data mode are qualitatively simi- 
lar, and are given in Table III. In this case, the optimal 
search analyzes about 24 days of data in the first stage 
(broken up into 9 segments) and 24 more days in the 
second stage (broken into 6 sements). The third stage 
analyzes the rest of the year's worth of data, divided 
into 7 segments. The total computational requirement is 
34.6 TFlops and its breakdown is given in Table IV. 



TABLE I: The optimal search parameters in data recycling 
mode. / max = 1000Hz, r min = 40yr, T max = lyr, h 2 th /S n = 
2.5 x 10 _6 sec _1 , and r\ is defined according to Eq. 38. 



Stage 


AT W (days) 




N (i) 


T^L (days) 


Vv 


1 


2.58 


0.7805 


10 


25.79 


9.08 


2 


3.51 


0.1139 


12 


42.13 


13.23 


3 


45.66 


0.8196 


8 


365.25 


33.86 



TABLE II: The computational cost to analyze one year of 
data in data-recycling mode. The search parameters are the 
same as given in Table I. C^L is the cost for the coherent de- 
modulation step and for the stack-slide step, while is 
the sum of these two. Follow-up indicates the computational 
cost require for the final follow-up stage. 



Stage 


C* w (Flop) 


C&1 (Flop) 


C# (Flop) 


1 


9.37 x 10 20 


6.21 x 10 19 


8.75 x 10 20 


2 


3.16 x 10 20 


2.46 x 10 20 


6.98 x 10 19 


3 


1.65 x 10 19 


2.73 x 10 18 


1.37 x 10 19 


Follow-up 


6.30 x 10 15 







We note the following features of these results. First, 
in both modes, basically all the data has been analyzed 
by the end of the third semi-coherent stage. This is 
not a requirement that we put in by hand, but rather it 
arises from the optimization: the optimal scheme "gets 
through" the entire year's worth of data before the final 
follow-up stage. Secondly, in data-recycling mode, 73.8% 
of the computing time is spent in the first stage, 24.9% 
in the second, 1.3% in the third and a negligible fraction 
in the follow-up. The results are similar for the fresh- 
data mode: approximately 74.2% of the computational 
resources are spent in the first stage, 24.2% in the second 
stage, 1.6% in the third stage and a negligible amount 
in the follow up stage. Finally, fresh-data mode entails 
a slightly lower computational cost than data-recycling 
mode. However this last fact could be an artifact either 
of having slightly different overall FD rates in the two 
cases, or of our using an overestimate of in the lat- 



TABLE III: Same as Table I, but for fresh-data mode. 



Stage 


AT W (days) 




N (i) 


Til (days) 




1 


2.71 


0.7829 


9 


24.35 


8.82 


2 


4.08 


0.0654 


6 


24.49 


10.17 


3 


45.20 


0.8229 


7 


316.42 


31.50 



14 



TABLE IV: Same as Table II, but for fresh-data mode. 



Stage 


C w (Flop) 


c£L (Flop) 


7T\ 

C£ (Flop) 


1 


8.11 x 10 20 


6.42 x 10 19 


7.46 x 10 20 


2 


2.64 x 10 20 


2.62 x 10 20 


2.67 x 10 18 


3 


1.74 x 10 19 


5.54 x 10 18 


1.19 x 10 19 


Follow-up 


1.62 x 10 16 







TABLE V: Search parameters for data-recycling mode with 
/max = 1000Hz, r min = 10 6 yr, T max = lyr, h 2 th /S n = 4.53 x 
10~ sec - , and computational power 40.2Tfiops; r\ is defined 
according to Eq. (38). 



Stage 


ATW (days) 




N (i) 


Til (days) 


Vv 


1 


14.84 


0.3514 


8 


118.72 


9.06 


2 


30.06 


0.0917 


6 


180.34 


11.70 


o 


52.18 


0.0986 


7 


365.25 


16.63 



ter case. The bottom line is that, after optimization, the 
two modes are almost equally efficient. 

If instead we consider a search for older pulsars, with 
T mi „ = 10 6 yr instead of 40 yr, then the optimal solution- 
for both modes are summarized in Tables V-VIII. A 
larger value of r min means a smaller number of templates, 
and therefore a more sensitive search for fixed computa- 
tional cost. 

For data-recycling mode, we have lowered the thresh- 
old hth by a factor 2.35, to the point where the required 
computational power is again 40.2 Tflops, as in the ex- 
ample of Tables I and II. The results are shown in tables 
V and VI. Compared to the young-pulsar search, the 
computational power is now spread more evenly over the 
first two stages: the first stage consumes about 58.27% 
of the power, the second stage 38.73%, third stage 3.0% 
and negligible for the follow-up stage. 

For the case of fresh-data mode, we have lowered the 
threshold h t h by a factor 2.36, to the point where the 
required computational power is again 34.6 Tflops, as in 
the example of Tables III and IV. Once again, compared 
to the young-pulsar search, the computational costs are 
spread more evenly over the first two stages: the first 
stage consumes about 31.3% of the power, the second 
stage 30.4%, third stage 14.0%. In this case, the cost 
for the follow-up stage is 24.0%, which is not negligible. 
This indicates that, for this case, the earlier stages have 
not succeeded in reducing the number of candidates to a 
low level. The overall sensitivity, though, is still almost 
identical to the data-recycling case. 

Let us now discuss the false alarm rate. We require 
that the overall FA rate be less than 1%, and we claimed 
in section III B that this is automatically satisfied in typi- 



TABLE VI: The computational cost to analyze one year of 
data in data-recycling mode. The search parameters are the 
same as given in Table V. C c „ h is the cost for the coherent de- 
modulation step and ci'J for the stack-slide step, while is 
the sum of these two. Follow-up indicates the computational 
cost require for the final follow-up stage. 



Stage 


C* w (Flop) 


c { :i (Flop) 


C?J (Flop) 


1 

2 
3 

Follow-up 


7.41 x 10 20 
4.93 x 10 20 
3.82 x 10 19 
6.18 x 10 13 


2.85 x 10 18 
3.77 x 10 20 
1.34 x 10 19 


7.39 x 10 20 
1.16 x 10 20 
2.48 x 10 19 



TABLE VII: Search parameters for fresh-data mode with 
/ max = 1000Hz, r min = 10 6 yr, T max = lyr, h 2 th /S n = 
4.47 x 10 6 sec 1 , and computational power 34.6Tflops; rj is 
defined according to Eq. 38. 



Stage 


AT W (days) 




iV (l) 


Til (days) 


Vv 


1 


11.77 


0.2074 


9 


105.96 


9.04 


2 


10.97 


0.0199 


6 


65.82 


7.13 


3 


27.64 


0.0206 


7 


193.47 


12.22 



cal, realistic cases. We can now verify this claim. For the 
Tmin = 40 yr search summarized in tables I— IV, using (38), 
with /u max = fJ,^°^ = 0.2, the threshold corresponds to 
Plt° h) = 2+/7/2 w 738. By Eq. (43), this corresponds 
to a s=s 10~ 318 (for either mode). Using Eq. (30), the 
number of independent templates required for a full co- 
herent search of the entire parameter space, using 1 yr of 
data, is / max T max iVp(lyr, 0.2, 1) w 10 34 . The overall false 
alarm rate is thus FA = / max T max iVpS w 10~ 284 < 1% 
[21]. 

Similarly, for the case r min = 10 6 yr, h^ h /S n = 4.53 x 
10~ 6 (data recycling mode, tables V and VI)), we get 
77 w 286 so that a ~ 10~ 61 . In this case, have N p w 
10 17 so that FA = / max T max 7Vj,a w 10~ 34 . If we look at 



TABLE VIII: Same as Table VI, except for fresh-data mode. 
The search parameters are those of Table VII. 



Stage 


C* w (Flop) 


c ( :i (Flop) 


c ( ;j (Flop) 


1 

2 
3 

Follow-up 


3.46 x 10 20 
3.35 x 10 20 
1.54 x 10 20 
2.67 x 10 20 


3.07 x 10 18 
3.33 x 10 20 
9.23 x 10 19 


3.43 x 10 20 
1.91 x 10 18 
6.19 x 10 19 
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fresh data mode (tables VII and VIII) with r min = 10 6 yr, 
h 2 th /S n = 4.47 x 1CT 6 , we get FA w 10~ 33 . These values 
are greater than for the case of young pulsars, but still 
vastly smaller than 1%. 

The basic point is simply this: For an all-sky search, 
sensitivity is limited by computing power, so the detec- 
tion threshold hth in practice is substantially higher than 
what it would be for infinite computing power. This 
means that for a signal to be detectable, it must have 
quite a high SNR (in the matched filter sense)-which 
means that the FA rate is exponentially small. Being 
computationally limited means that when we do detect 
something, we can be very confident that it is not simply 
random noise masquerading as a signal [22] 

How accurate are our numerical results? The total 
computational cost is a complicated function on a 9- 
dimensional space and thus is not easy to visualize. We 
can, however, take appropriate sections of this function 
to examine its behavior near the minimum. Thus, we can 
ask whether variations in, say, ATW or AT<-V 

away from 

their optimal values, increase the computational cost (as 
they should, if should if we have truly found a minimum) . 
To answer this, in Fig. 3 we plot the total computational 
power as a function of AT^ and AT^- 2 \ respectively, for 
the young-pulsar searches summarized in Tables I- IV. All 
the other parameters fixed at their optimal values. The 
minima of these curves agree precisely with our simu- 
lated annealing results. Similarly, Figs. 4 and 5 carry the 
same message, as well as showing the strong dependence 
of the computational cost on AW and //* ax . (It is not so 
clear from the plot of P vs. fi^} x that this curve has a 
minimum in the range shown, but it does in fact have a 
very shallow one.) 

For the plot of computational cost versus N^ 3 \ we 
are not allowed to keep AT^ 3 ) fixed, since that could 
violate the constraint AT^A^ < T max . (Recall that 
AT( 3 ) A^ 3 ) = T max = 1 yr for the optimal 3-stage solution, 
which is therefore just at the boundary of the constraint 
region.) Therefore, we choose to plot the computational 
cost as a function of while simultaneously varying 
according to 

AT ( 3 ) =T max /AT( 3 ). 
A noteworthy feature of these plots is that the compu- 
tational power P depends more sensitively on the early- 
stage parameters than the late-stage ones; e.g., more 
sensitively on A' 1 ' and A^ 2 ^ than on A^ 3 ^. This result 
should not be surprising since, as mentioned earlier, for 
the young-pulsar search the computational cost of the 
higher stages is relatively small. 



D. The spindown-age and the SNR 

How does the (minimum) computational cost depend 
on the shortest spindown timescale that we search over, 
7"mi„? Consider again the case where we have one year of 
data and we perform an all-sky search up to a frequency 
of / max = 1000 Hz. Fig. 6 shows the result for the both 
data-recycling and fresh-data mode, for two different val- 



,x 1 



q 13 Data Recycling 



. 10 13 Fresh Data 



r 7 

o 

0- fi 



Q_ 

E 









— AT< 1 > 
--- AT |2) 
















I 










\ 

















AT"' 
AT< 2 > 



3 4 5 
AT (days) 



6 2 



4 6 

AT (days) 



FIG. 3: Computational power P as a function of AT 1 ' 1 ' and 
AT^ 2 ' , with all other parameters fixed to their optimal values. 
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FIG. 4: Computational power P as a function of iV (1) , N {2) 
and iV (3) . For the N w and iV (2) plots, all other parameters 
have fixed to their optimal values. For the plot, we have 
also varied AT (3) according to AT (3) = T max /iV (3) in order 
to satisfy the constraint that the amount of data available is 
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FIG. 5: Computational power P as a function of /il^ x , fj.^l x , 
and /x^2x- For each plot, all other parameters are fixed at 
their optimal values. 
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FIG. 6: The minimum computational power P required for 
analyzing 1 year's worth of data as a function of the pulsar's 
spindown age r min . We consider a three-stage search in the 
both data-recycling and fresh-data mode, for two different 
signal strengths. The data-recycling mode results are shown 
with dashed lines, while the fresh-data results are in dotted 
lines. In parts of the curves, the results for the two modes are 
so close together that it is hard to distinguish them. 



ues of the 1-year SNR. Note that these results do pass 
simple sanity checks: the computational cost decreases as 
the SNR increases (since it is easier to look for stronger 
signals), and the computational cost decreases as r min 
increases (since it is easier to search through a smaller 
parameter space). 

One can also ask: for a given available computational 
power, how does the threshold SNR scale with T min ? This 
is shown in Fig. 7. The plot is based on the assump- 
tion that we have one year's worth of data and that we 
have 10 TFlops of computing power at our disposal. By 
"SNR" , here we mean the matched-filter SNR, for a per- 
fectly matched filter. Fig. 7 tells us that a search for 
unknown GW pulsars with spindown ages > 10 6 yr can 
detect ~ 85 — 90% of pulsars whose SNR is > 17 (again, 
with FA rate << 1%). In an all-sky search for very young 
pulsars, with r min — 40 yr, the SNR required for de- 
tection (with the same FD and FA rates) increases to 
~ 43. In comparison, for a source where the sky position 
and frequency arc known in advance (from radio obser- 
vations), an SNR of only 4.56 is required for detection, 
with a 10% FD rate and 1% FA rate [4]. 

Fig. 7 strongly suggests that one would like to simul- 
taneously perform at least two different all-sky searches: 
one for old GW pulsars and another for young ones, with 
comparable (within a factor ten) computer power de- 
voted to each, but with quite different thresholds. (If 
one set the same threshold for both old and young pul- 
sars, then almost all computing resources would end 
up being spent on the young ones.) Clearly, to deter- 



FIG. 7: The 1-year SNR (with zero mismatch) as a function 
of T min , for fixed computational power P = 10 13 Flops. The 
dashed line indicates the result for data-recycling mode and 
the dotted line for fresh-data. Since these two result are very 
close to each other, it may be difficult to distinguish them. 



mine the "best" apportionment of resources between the 
two types of searches would require some additional in- 
puts/assumptions, but at least Fig. 7 seems a good first 
step towards making an intelligent allocation. 



VI. CONCLUSIONS 

Let us first summarize the main results of this paper. 
We have studied general hierarchical strategies for search- 
ing for gravitational waves from unknown, isolated GW 
pulsars. In particular, we have considered multistage hi- 
erarchical algorithms where each stage (except the last) 
consists of a coherent demodulation of short stacks of 
data followed by appropriate sliding and stacking of the 
^"-statistics results from the different stacks. The succes- 
sive stages serve to quickly reduce the number of candi- 
dates; they are followed by final coherent follow-up stage 
to fully analyze the remaining candidates. 

We have optimized this strategy by minimizing the 
computational cost P subject to the constraints which 
specify the total amount of data available and the desired 
confidence levels. Of course, P depends on the size of the 
parameter space- in particular on the range of frequen- 
cies and spindown ages that are searched over. Carrying 
out the optimization, and varying over the number n of 
semi-coherent stages, we found that the advantages of 
the multistage approach saturate at n — 3 (i.e., n = 4 
and 5 are scarcely better). 

The optimized search parameters {N^ l \ AT ^ , ^ ) we 
report should only be considered a rough guide for carry- 
ing out a search in practice because i) in many places we 
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have used theoretical estimates of the operations counts 
instead of those obtained by profiling existing codes, ii) 
we have not considered issues of memory storage or the 
cost of performing any Monte-Carlo simulations, and iii) 
the detector noise has throughout been assumed to be 
Gaussian and stationary. Furthermore, the template 
counting formulae (30) used in this paper are, strictly 
speaking, valid only for observation times significantly 
less than a year. The numbers presented in this work 
must be recalculated when better approximations be- 
come available. In spite of these limitations, we believe 
that our results do provide a useful qualitative guide to 
what an optimized all-sky search "looks like." In order 
to optimize actual search codes, applied to actual data, 
one must 1) profile the codes to determine the actual 
computational cost of the different operations, and 2) do 
Monte Carlo studies to determine the actual for dif- 
ferent thresholds. (Recall that the formulae given here 
are based on the assumption stationary, Gaussian noise.) 
The latter could require considerable work, especially if 
the results are strongly frequency-dependent, with some 
bands being much "better behaved" than others. 

Finally, we mention some other possibilities for future 
work: 

• It would be trivial to extend our work to con- 
sider searches that are less computationally chal- 
lenging than all-sky ones, but that are still compu- 
tationally limited. E.g., one could consider searches 
for unknown NSs in supernova remnants (such as 
SN 1987A), in which case the sky-position is well 
known but the frequency and spindown parameters 
must be searched over. Similarly, one could con- 
sider a search over a small fraction of sky, e.g., a 
portion containing the Galactic center or the disk. 

• The formulae for operations counts, confidence lev- 
els etc. can also be derived for case when the Hough 
transform [6] is used in the semi-coherent stages in- 
stead of the stack-slide method; the optimization of 
multistage, hierarchical Hough-type searches would 
then proceed in the same way as developed here. 

• We expect that the lessons learned in this paper 
will carry over to searches for GW pulsars in low- 
mass X-ray binaries, which are also a computation- 
aly limited [16]. However the details are yet to be 
worked out. 

• The problem of searching, in LISA data, for the in- 
spiral signals of stellar-mass compact objects cap- 
tured by <~ 1O 6 M BHs in galactic nuclei, is similar 
to the GW pulsar search problem, but even more 
computationally challenging [18]. We expect that 
the lessons learned in this paper will also be very 
useful in formulating and optimizing a search algo- 
rithm for LISA capture sources. 

• In this paper we have tacitly assumed that the 
search is performed by a single computer or com- 
puting cluster. However, at least in the next 



few years, the most computationally intensive GW 
searches will be directed by Einstein@Home[17], 
which relies on tens of thousands of individual par- 
ticipants donating their idle computing power. In 
this case, there might be additional constraints that 
we have not yet considered, relating to the rate at 
which data and intermediate results can be sent 
back and forth between the Einstein@Home[17] 
servers and participants' computers, how much 
storage is available for use on participants' comput- 
ers, etc. We intend to study hierarchical searches 
in this context also, to see which if any of the 
lessons learned here must be modified for the 
Einstein@Home context. 
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APPENDIX A: COMPUTATIONAL COST OF 
THE SFT METHOD 

Here we estimate the computational cost (in floating 
point operations) of calculating the JF-statistic via the 
SFT method. This result is used in Sec.IV.C. 

We begin by reviewing the details of the SFT method; 
our description closely follows that given in the doc- 
umentation of the software package LIGO Algorithms 
Library [7]. Imagine that we wish to compute the T- 
statistic for a data stretch of length AT. Divide this 
data into M shorter segments of length T 3ft = AT/M, 
each containing N data points (so there are MN data 
points within AT). The sampled values of x(t) can then 
be written where < a < M labels the segment 

and < j < N labels points within a segment. Eq. (12) 
can then be discretized as follows: 

M-l JV-1 

F »w = E E (ai) 

a=0 j=0 

and similarly for Eq. (13). Let x a k be the discrete Fourier 
transform of x a j along the index j, so that 

1 Ar - 1 

x aj — ~jy ^ ' Xak£ ^ ^ ■ (A2) 
fc=0 

Then if we approximate the amplitude modulation func- 
tion a(t) as constant over the short-time baseline T aft , the 
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expression for F a becomes 

M-l JV-l 



a=0 



k=0 



1 f ^'TTI, *! 



N 



(A3) 

The short-time baseline T sft is generally chosen so that 
neither pulsar spindown nor the Doppler effect causes the 
signal power to shift by more than, say, half a frequency 
bin. Then we can find functions A a (\) and B ak {\) such 
that to a good approximation 



N 



N 



(A4) 



Thus we have 



1 N ~ 1 f2mjk A 

3=0 



= e ~iA a {\) 



N 

1 _ e -iB ak (X) \ 
1 - e -iB ak (*)/N ) 



(A5) 



Next we assume N is large enough that 1 — e lB a,kW/N 
iB ak (\)/N; then 



M-l 



JV-l 



F a = ^2 a a t 



-iA a (\) 



x ak P[B ak {\)] (A6) 



a=0 



k=0 



where 



P[x] 



1 1 



Nl- e ~ ix / N 

sin x 1 — cos x 
i 



(A7) 



M-l 



F a = ^ a ° 



-iA a (X) 



x ak ,P[B ak ,(\)] (A8) 



Q=0 



-D 



Now arises the great advantage of the SFT method: the 
function P[x] is sharply peaked about x = 0, so the 
sum over k can be approximated by retaining only a few 
terms: 



where k' = k — k* and k* is the value of k such that 
B ak * (A) = 0, and D is the number of terms that we 
retain in the sum on either side of A;*. It turns out that 
D = 16 suffices to calculate the ^-statistic to within a 
few percent. 

Eq. (A8) is our final approximation for F a . Analogous 
expressions hold for F , and the final JF-statistic is calcu- 
lated from F a and F b using Eq. (11). Thus, with the SFT 



method, for each point A in parameter space we need to 
calculate A a (X), B ak (\), and the amplitude modulation 
functions, a a and b a , and then to perform the sums in 
Eq. (A8). It is then easy to see that to calculate the T- 
statistic for n frequency bins, for a fixed value of A, the 
number of floating point operations required is roughly 
some constant times nMD. To estimate the constant, 
let C\ be the cost of calculating P\B ak \ for each a and k 
value. Since multiplying two complex numbers requires 
6 operations, and adding two complex numbers requires 
2 operations, we see that calculating the sum over k' in 
equation (A8) requires 



{C\ +6)(2D + 1) +4D 



(A9) 



operations. Similarly, if Ci is the cost of calculating 
a a e~ lAa for every a, then the cost of calculating F a is 

M[(C 1 +6)(2D+1)+4D]+MC 2 +6M+2(M-1) . (A10) 

Thus, to find F a and Ft, for every frequency bin requires 
«2x (2C\ + \Q)DM operations. Since the cost of com- 
bining F a and Fb to get T is negligible compared to this, 
and assuming C\ to be of order unity, we see that the 
operation count for calculating the ^"-statistic for n fre- 
quency bins is approximately 



AOnMD . 



(All) 



For the first stage in the hierarchical search, the T- 
statistic is evaluated for N (1 ' N^J f m ^AT bins. So, tak- 
ing D = 16 and using M = ATW/T sft , the cost is 



640^ (1) JV^ ) /. 



(ATW) 2 

T sft 



At higher stages, we evaluate the ^-statistic 



Fornax <^ 1, 



(i) 



jY 



(i-i) 



max < 1 



pf 



times, so the operations count is 



(A12) 
(A13) 



ir(i-i) 



N, 



(i) 



max • 



pc 



N, 



(i-i) 



max < 1, 



pf 

640JVWATW" 



AT( i_1 ) 



T sft 



(A14) 



[1] P.R. Brady, T. Creighton, C. Cutler, and B.F. Schutz, (2000). 

Phys. Rev. D57 2101 (1998). [3] P. Jaranowski, A. Krolak, and B.F. Schutz, Phys. Rev. 

[2] P.R. Brady and T. Creighton, Phys. Rev. D61, 082001 D58 063001 (1998). 



19 



[4] B. Abbott et al. (The LIGO Scientfic Collaboration), 

Phys. Rev. D69, 082004 (2004). 
[5] R. Prix and Y. Itoh, gr-qc/0504006. 
[6] B. Krishnan, A.M. Sintes, M.A. Papa, B.F. Schutz, 

S. Frasca, and C. Palomba, Phys. Rev. D70, 082001 

(2004). 

[7] The software can be found on the following websites: 
http : //www. lsc-group .phys .uwm.edu/daswg/projects/ 
lal.html and http://www.lsc-group.phys.uwm.edu/ 
daswg/projects/lalapps .html 

[8] B. Owen, Phys. Rev. D53, 6749 (1996). 
[9] J.H. Conway and N.J. A. Sloane, Sphere Packings, Lat- 
tices and Groups, Springer (1991). 

[10] P. Astone, K.M. Borkowski, P. Jaranowski, and 
A. Krolak, Phys. Rev. D65 042003 (2003). 

[11] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, 
E. Teller, J. Chem. Phys. 21, 1087 (1953). 

[12] S. Kirkpatrick, CD. Gelatt Jr., and M.P. Vecchi, Science 
220, No. 4598, 671 (1983). 

[13] J.A. Nelder and R. Mead, Comput. J. 7, 308-313, 1965. 

[14] W.H. Press, Numerical Recipes in C, Cambridge Univer- 
sity Press (2002). 

[15] http : //www . gnu . org/ software/ gsl/. 

[16] S.V Dhurandhar and A. Vecchio, Phys. Rev. D63 122001 
(2001). 

[17] http : //www.physics2005 . org/events/einsteinathome/ 
#einsteinathome 



[18] J. R. Gair, L. Barack, T. Creighton, C. Cutler, S. L. 
Larson, E. S. Phinney, and M. Vallisneri, Proceedings 
of the Eighth GWDAW Meeting (Milwaukee, 2003); gr- 
qc/0405137. 

[19] Of course an actual search should must take into account 
the so-called Einstein and Shapiro delays, but these are 
unimportant for the question of how the search is most 
efficiently organized, which is the focus of this paper. 

[20] An exception is a search for old pulsars presented in ta- 
bles VII and VIII, where the cost for the follow-up stage 
turns out to be non-negligible. This example in particu- 
lar will have to be revisited when better formulas for N p 
are available. 

[21] Strictly speaking, Eq. (30) for the number of templates 
is valid only for observation times which are significantly 
less than a year. However, unless the discrepancy is many 
orders of magnitude, the numbers obtained show that it 
is obviously sufficient for the purposes of this argument. 

[22] Of course, no sensible person would ever claim that he 
or she had detected a GW pulsar with FA probability of 
less than 10~ 284 . In such a case, the "statistical error" is 
so ridiculously small that the true FA rate is dominated 
by the other, hard-to-quantify factors, such as the prob- 
ability of having some bug in the instrumentation or in 
the data analysis code. 



